AITopics | action primitive

Collaborating Authors

action primitive

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Zhang, Jiajie, Schwertfeger, Sören, Kleiner, Alexander

arXiv.org Artificial IntelligenceDec-2-2025

W e present a novel unsupervised framework to unlock vast unlabeled human demonstration data from continuous industrial video streams for Vision-Language-Action (VLA) model pre-training. Our method first trains a lightweight motion tokenizer to encode motion dynamics, then employs an unsupervised action segmenter leveraging a novel "Latent Action Energy" metric to discover and segment semantically coherent action primitives. The pipeline outputs both segmented video clips and their corresponding latent action sequences, providing structured data directly suitable for VLA pre-training. Evaluations on public benchmarks and a proprietary electric motor assembly dataset demonstrate effective segmentation of key tasks performed by humans at workstations. Further clustering and quantitative assessment via a Vision-Language Model confirm the semantic coherence of the discovered action primitives. T o our knowledge, this is the first fully automated end-to-end system for extracting and organizing VLA pre-training data from unstructured industrial videos, offering a scalable solution for embodied AI integration in manufacturing.

action primitive, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2511.21428

Country: Asia > China (0.28)

Genre:

Research Report (0.64)
Workflow (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

RoboPilot: Generalizable Dynamic Robotic Manipulation with Dual-thinking Modes

Liu, Xinyi, Sani, Mohammadreza Fani, Zhou, Zewei, Wirbel, Julius, Zarrin, Bahram, Galeazzi, Roberto

arXiv.org Artificial IntelligenceOct-2-2025

Despite rapid progress in autonomous robotics, executing complex or long-horizon tasks remains a fundamental challenge. Most current approaches follow an open-loop paradigm with limited reasoning and no feedback, resulting in poor robustness to environmental changes and severe error accumulation. We present RoboPilot, a dual-thinking closed-loop framework for robotic manipulation that supports adaptive reasoning for complex tasks in real-world dynamic environments. RoboPilot leverages primitive actions for structured task planning and flexible action generation, while introducing feedback to enable replanning from dynamic changes and execution errors. Chain-of-Thought reasoning further enhances high-level task planning and guides low-level action generation. The system dynamically switches between fast and slow thinking to balance efficiency and accuracy. To systematically evaluate the robustness of RoboPilot in diverse robot manipulation scenarios, we introduce RoboPilot-Bench, a benchmark spanning 21 tasks across 10 categories, including infeasible-task recognition and failure recovery. Experiments show that RoboPilot outperforms state-of-the-art baselines by 25.9\% in task success rate, and the real-world deployment on an industrial robot further demonstrates its robustness in real-world settings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.00154

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)

Add feedback

FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation

Xu, Hongli, Zhang, Lei, Hu, Xiaoyue, Zhong, Boyang, Bai, Kaixin, Márton, Zoltán-Csaba, Bing, Zhenshan, Chen, Zhaopeng, Knoll, Alois Christian, Zhang, Jianwei

arXiv.org Artificial IntelligenceSep-24-2025

General-purpose robotic skills from end-to-end demonstrations often leads to task-specific policies that fail to generalize beyond the training distribution. Therefore, we introduce FunCanon, a framework that converts long-horizon manipulation tasks into sequences of action chunks, each defined by an actor, verb, and object. These chunks focus policy learning on the actions themselves, rather than isolated tasks, enabling compositionality and reuse. To make policies pose-aware and category-general, we perform functional object canonicalization for functional alignment and automatic manipulation trajectory transfer, mapping objects into shared functional frames using affordance cues from large vision language models. An object centric and action centric diffusion policy FuncDiffuser trained on this aligned data naturally respects object affordances and poses, simplifying learning and improving generalization ability. Experiments on simulated and real-world benchmarks demonstrate category-level generalization, cross-task behavior reuse, and robust sim2real deployment, showing that functional canonicalization provides a strong inductive bias for scalable imitation learning in complex manipulation domains. Details of the demo and supplemental material are available on our project website https://sites.google.com/view/funcanon.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.19102

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Add feedback

Prepare Before You Act: Learning From Humans to Rearrange Initial States

Dai, Yinlong, Keyser, Andre, Losey, Dylan P.

arXiv.org Artificial IntelligenceSep-23-2025

Imitation learning (IL) has proven effective across a wide range of manipulation tasks. However, IL policies often struggle when faced with out-of-distribution observations; for instance, when the target object is in a previously unseen position or occluded by other objects. In these cases, extensive demonstrations are needed for current IL methods to reach robust and generalizable behaviors. But when humans are faced with these sorts of atypical initial states, we often rearrange the environment for more favorable task execution. For example, a person might rotate a coffee cup so that it is easier to grasp the handle, or push a box out of the way so they can directly grasp their target object. In this work we seek to equip robot learners with the same capability: enabling robots to prepare the environment before executing their given policy. We propose ReSET, an algorithm that takes initial states -- which are outside the policy's distribution -- and autonomously modifies object poses so that the restructured scene is similar to training data. Theoretically, we show that this two step process (rearranging the environment before rolling out the given policy) reduces the generalization gap. Practically, our ReSET algorithm combines action-agnostic human videos with task-agnostic teleoperation data to i) decide when to modify the scene, ii) predict what simplifying actions a human would take, and iii) map those predictions into robot action primitives. Comparisons with diffusion policies, VLAs, and other baselines show that using ReSET to prepare the environment enables more robust task execution with equal amounts of total training data. See videos at our project website: https://reset2025paper.github.io/

artificial intelligence, machine learning, reduction policy, (17 more...)

arXiv.org Artificial Intelligence

2509.18043

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Embodied Intelligence in Disassembly: Multimodal Perception Cross-validation and Continual Learning in Neuro-Symbolic TAMP

He, Ziwen, Wang, Zhigang, Peng, Yanlong, Chang, Pengxu, Yang, Hong, Chen, Ming

arXiv.org Artificial IntelligenceSep-16-2025

Abstract-- With the rapid development of the new energy vehicle industry, the efficient disassembly and recycling of power batteries have become a critical challenge for the circular economy. In current unstructured disassembly scenarios, the dynamic nature of the environment severely limits the robustness of robotic perception, posing a significant barrier to autonomous disassembly in industrial applications. This paper proposes a continual learning framework based on Neuro-Symbolic task and motion planning (T AMP) to enhance the adaptability of embodied intelligence systems in dynamic environments. Our approach integrates a multimodal perception cross-validation mechanism into a bidirectional reasoning flow: the forward working flow dynamically refines and optimizes action strategies, while the backward learning flow autonomously collects effective data from historical task executions to facilitate continual system learning, enabling self-optimization. Experimental results show that the proposed framework improves the task success rate in dynamic disassembly scenarios from 81.68% to 100%, while reducing the average number of perception misjudgments from 3.389 to 1.128. This research provides a new paradigm for enhancing the robustness and adaptability of embodied intelligence in complex industrial environments. I. INTRODUCTION With the rapid development of Industry 4.0 and the circular economy, industrial disassembly has become a critical link in intelligent manufacturing and resource recycling, facing unprecedented technical challenges [1], [2].

artificial intelligence, execution, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.1127

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Transportation > Electric Vehicle (0.91)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.63)

Add feedback

OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation

Jawaid, Ahad, Xiang, Yu

arXiv.org Artificial IntelligenceSep-9-2025

Egocentric human videos provide scalable demonstrations for imitation learning, but existing corpora often lack either fine-grained, temporally localized action descriptions or dexterous hand annotations. We introduce OpenEgo, a multimodal egocentric manipulation dataset with standardized hand-pose annotations and intention-aligned action primitives. OpenEgo totals 1107 hours across six public datasets, covering 290 manipulation tasks in 600+ environments. We unify hand-pose layouts and provide descriptive, timestamped action primitives. To validate its utility, we train language-conditioned imitation-learning policies to predict dexterous hand trajectories. OpenEgo is designed to lower the barrier to learning dexterous manipulation from egocentric video and to support reproducible research in vision-language-action learning. All resources and instructions will be released at www.openegocentric.com.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.05513

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.61)

Add feedback

Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives

Neural Information Processing SystemsAug-17-2025, 01:18:29 GMT

These parameterized primitives are expressive, simple to implement, enable efficient exploration and can be transferred across robots, tasks and environments.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Efficient Robotic Garment Manipulation with Standardization

Zhou, Changshi, Luan, Feng, Hu, Jiarui, Meng, Shaoqiang, Wang, Zhipeng, Dong, Yanchao, Zhou, Yanmin, He, Bin

arXiv.org Artificial IntelligenceJul-1-2025

Garment manipulation is a significant challenge for robots due to the complex dynamics and potential self-occlusion of garments. Most existing methods of efficient garment unfolding overlook the crucial role of standardization of flattened garments, which could significantly simplify downstream tasks like folding, ironing, and packing. This paper presents APS-Net, a novel approach to garment manipulation that combines unfolding and standardization in a unified framework. APS-Net employs a dual-arm, multi-primitive policy with dynamic fling to quickly unfold crumpled garments and pick-and-place (p and p) for precise alignment. The purpose of garment standardization during unfolding involves not only maximizing surface coverage but also aligning the garment's shape and orientation to predefined requirements. To guide effective robot learning, we introduce a novel factorized reward function for standardization, which incorporates garment coverage (Cov), keypoint distance (KD), and intersection-over-union (IoU) metrics. Additionally, we introduce a spatial action mask and an Action Optimized Module to improve unfolding efficiency by selecting actions and operation points effectively. In simulation, APS-Net outperforms state-of-the-art methods for long sleeves, achieving 3.9 percent better coverage, 5.2 percent higher IoU, and a 0.14 decrease in KD (7.09 percent relative reduction). Real-world folding tasks further demonstrate that standardization simplifies the folding process. Project page: see https://hellohaia.github.io/APS/

artificial intelligence, garment, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2506.22769

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain > Aragón (0.04)

Genre: Research Report > Promising Solution (0.54)

Industry:

Health & Medicine (0.93)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)

Add feedback

Predictability-Based Curiosity-Guided Action Symbol Discovery

Kilic, Burcu, Ahmetoglu, Alper, Ugur, Emre

arXiv.org Artificial IntelligenceMay-27-2025

Discovering symbolic representations for skills is essential for abstract reasoning and efficient planning in robotics. Previous neuro-symbolic robotic studies mostly focused on discovering perceptual symbolic categories given a pre-defined action repertoire and generating plans with given action symbols. A truly developmental robotic system, on the other hand, should be able to discover all the abstractions required for the planning system with minimal human intervention. In this study, we propose a novel system that is designed to discover symbolic action primitives along with perceptual symbols autonomously. Our system is based on an encoder-decoder structure that takes object and action information as input and predicts the generated effect. To efficiently explore the vast continuous action parameter space, we introduce a Curiosity-Based exploration module that selects the most informative actions -- the ones that maximize the entropy in the predicted effect distribution. The discovered symbolic action primitives are then used to make plans using a symbolic tree search strategy in single- and double-object manipulation tasks. We compare our model with two baselines that use different exploration strategies in different experiments. The results show that our approach can learn a diverse set of symbolic action primitives, which are effective for generating plans in order to achieve given manipulation goals.

artificial intelligence, machine learning, planning & scheduling, (19 more...)

arXiv.org Artificial Intelligence

2505.18248

Country: